-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement Trino source #585
Conversation
Thanks @domnikl for the PR. I think it in general looks great and complete!
Yes, I think this might be an issue in terms of performance. I don't have experience of using prusto before, but it seems like the
I'm not sure what exactly the bug is. Currently we are converting the It would be great to also submit the corresponding seed database script here so others can run the python test_trino locally. |
|
||
|
||
@pytest.fixture(scope="module") # type: ignore | ||
def mysql_url() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these functions should be rename to test_trino_xxx from test_mysql_xxx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy & paste error, I'll rename it 👍🏻
@wangxiaoying thanks for looking into it!
The problem is with reading timestamps with time zone types, prusto does not support it (yet) by the looks of it. Instead it returns
Will do in the next days and also rework fetching from the database with the |
@wangxiaoying I implemented the missing partitioning as well as switched to get/get_next for prusto and fixed the tests and added test data and squashed some bugs along the way. Could you have a look at it again please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we probably are not going to make trino in our CI on github workflow. Can you make all these tests skipped if TRINO_URL
is not set? Something similar to oracle here.
connectorx/src/sources/trino/mod.rs
Outdated
|
||
#[throws(TrinoSourceError)] | ||
fn get_total_rows(rt: Arc<Runtime>, client: Arc<Client>, query: &CXQuery<String>) -> usize { | ||
rt.block_on(client.get_all::<Row>(query.to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this will execute the entire query and get all the results, which may be a bottleneck in terms of performance (since the query will be ran twice). Can we make it to using the SELECT COUNT(*)
query to get the total rows? We have a util function named count_query
. An example can be found here.
Hi @domnikl , thanks so much for completing the PR! I've tested the test cases locally and it seems works well. I have two minor comments above but in general I think the code looks good and complete. We probably also need to add a documentation page for trino here. I think after fixing the above minor issues, we can merge it and make it into our next release! |
@wangxiaoying thanks for having a look! Added the missing pieces and a bit of documentation for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @domnikl , the update looks good to me! We can merge this PR after the CI is passed : )
I'd really like to see Trino being implemented as a source in ConnectorX and thus have begun working on it. It uses prusto as a client and currently supports all basic types. Support for tuple, array, row and uuid isn't implemented yet and there's a bug with time columns not being mapped correctly. Also I want to look into fetching results more efficiently and not everything at once.
What do you think, is it worth continuing work on it? Is there anything else I need to consider eg. regarding mapping the results?